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Abstract. We introduce a novel technique for finding real errors in pro- 
grams. The technique is based on a synergy of three well-known meth- 
ods: metacompilation, slicing, and symbolic execution. More precisely, 
we instrument a given program with a code that tracks runs of state 
machines representing various kinds of errors. Next we slice the program 
to reduce its size without affecting runs of state machines. And then 
we symbolically execute the sliced program. Depending on the kind of 
symbolic execution, the technique can be applied as a stand-alone bug 
finding technique, or to weed out some false positives from an output 
of another bug-finding tool. We provide several examples demonstrating 
the practical applicability of our technique. 



1 Introduction 

The title of this paper refers to two popular bug- finding techniques: metacom- 
pilation and symbolic execution. The two techniques use completely different 
principles leading to different advantages and disadvantages. 

Metacompilation [10126] is a static analysis technique looking for various 
kinds of errors specified by state machines. We explain the technique with use 
of the state machine SM(x) of Figure [TJ which describes errors in lock manip- 
ulation. Intuitively, the state machine represents possible courses of states of 
a lock referenced by x along an execution of a program. The state of the lock 
is changed according to a transition of the state machine if the execution per- 
forms a program statement syntactically subsuming the label of the transition. 
We would like to decide whether there exists any program execution where an 
instance of state machine SM(x) assigned to some lock of the analyzed program 
reaches an error state. Unfortunately, this is not feasible due to potentially un- 
bounded number of executions and unbounded execution length. Hence, we use 
static analysis to overapproximate the set of reachable states of state machines. 

Let us assume that we want to check the program of Figure [2] for errors 
specified by the state machine SM(x). First, we find all locks in the program 
and to each lock we assign an instance of the state machine. In our case, there is 
only one lock pointed by L and thus only one instance SM(L). For each program 
location, we compute a set overapproximating possible states of SM(L) after 
executions leading to the location. Roughly speaking, we initialize the set in the 



unlock(x) 




Fig. 1. State machine SM(x) describing errors in manipulation with lock x. The nodes 
U and L refer to states unlocked and locked, respectively. The other three nodes refer 
to error states: DU to double unlock, DL to double lock, and RL to return in locked 
state. The initial node is U. 

initial location to {[/} and the other sets to 0. Then we repeatedly update the 
sets according to the effect of individual program statements until the fixed point 
is reached. The resulting sets for the program of Figure [5] are written directly in 
the code listing as comments. 

As we can see, the sets contain two error states: double unlock after the 
unlock (L) statement and return in locked state in the terminal location. If we 
analyze the computation of the sets, we can see that the first error corresponds 
to executions going through lines 1,2,3,4,8, then iterating the while-loop and 
finally passing lines 13,14. These execution paths are not feasible due to the 
value of len, which is set to at line 3 and assumed to satisfy len > at line 
13. Hence, the first error is a false positive. The second error corresponds to 
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char *copy(char *dst , 


char * src , int n , 


int *L 


) { 


2 


int i , len ; 




// 


{UJ 


3 


len = 0; 




// 


{U} 


4 


if (src != NULL && 


dst != NULL) { 


// 


{U} 


5 


len = n; 




// 


{U} 


6 


lock (L) ; 




// 


{L} 


7 


} 




// 


{U,L} 


8 


i = 0; 




// 


{U,L} 


9 


while (i < len) { 




// 


{U,L} 


10 


dst [i] = src [i] 




// 


{U,L} 


11 


i + +; 




// 


{U,L} 


12 


} 




// 


{U,L} 


13 


if (len > 0) { 




// 


{U,L} 


14 


unlock (L) ; 




// 


{DU , U} 


15 


} 




// 


{U,L} 


16 


return dst ; 




// 


{U , RL } 



17: } 

Fig. 2. Function copy copying a source string src into a buffer dst using a lock L to 
prevent parallel writes. 
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executions passing lines 1,2,3,4,5,6,7,8, then iterating the while-loop and finally 
going through lines 13,16. All these paths are also infeasible except the one 
that performs zero iterations of the while-loop, which is the only real execution 
leading to the only real locking error in the program. 

To sum up, metacompilation is highly flexible, fast, and thus applicable on 
extremely large software projects (e.g. the Linux kernel). It examines all the 
code and finds many error reports. Unfortunately, some of the reports are false 
positives. The main source of false positives is related to the fact that the anal- 
ysis does not work with data values. In particular, the analysis does not track 
connections between variable values and states of state machines. A drawback of 
this approach may be illustrated by the double unlock false positive detected in 
the program of Figure because the analysis does not know that the condition 
at line 13 holds only if the state machine SM(h) is in state L. 

At this point, we would like to emphasize that metacompilation actually 
uses a more sophisticated algorithm enriched with many techniques for partial 
elimination of false positives (see [26] for details). Metacompilation employs a 
dedicated language for description of state machines called Metal. The idea 
of error specification using state machines appears in several tools including 
the original implementation of metacompilation called XGCC [55], Esp [TT] or 
Stanse [2"5] . 

In contrast to metacompilation, symbolic execution [27) analyzes each exe- 
cution path separately. In contrast to standard execution, symbolic execution 
replaces input data by symbols representing arbitrary values. Executed state- 
ments then manipulate expressions over the symbols rather than exact values. 
For each execution path, symbolic execution builds a formula called path con- 
dition, which is a necessary and sufficient condition on input data to drive the 
execution along the path. Whenever a path condition becomes unsatifiable, the 
symbolic execution of this path is aborted as the path is unfeasible. The main 
advantage of symbolic execution is that it works only with feasible executions 
(assuming that we can decide satisfiability of a path condition) and hence it does 
not report any false positives. A minor disadvantage is that implementations of 
this technique usually detect only low-level errors leading to crash. To detect 
a specific kind of error, the program has to be modified to reduce the error to 
a detected one (typically violation of an assert statement). The main disad- 
vantage of the technique is its high compuation cost. In particular, programs 
containing loops or recursion have typically large or even infinite number of exe- 
cution paths and cannot be completely analyzed by symbolic execution. Hence, 
symbolic execution usually explores only a part of an analyzed program. 

In this paper, we introduce a new technique offering a flexibility of metacom- 
pilation and zero false positive rate of symbolic execution. The basic idea is very 
simple: we use the concept of state machines to get flexibility in error specifica- 
tion. Then we instrument a given program with a code for tracking behaviors of 
the state machines. The instrumented program is then reduced using the slic- 
ing method introduced in [38]. The sliced program has to meet the criterion to 
be equivalent to the instrumented program with respect to reachability of error 
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states of tracked state machines. Note that slicing may remove big portions of 
the code, including loops and function calls. Hence, an original program with an 
infinite number of execution paths may be reduced to a program with a finite 
number of execution paths. Finally, we execute the sliced program symbolically. 

Our technique may be used in two ways according to the applied symbolic 
execution tool. If we apply a symbolic executor that prefers to explore more 
parts of the code (for example, it can explore only the execution paths iterating 
each program loop at most twice), we may use the technique as a general bug- 
finding technique reporting only real errors. Note that this approach may miss 
errors appearing only on unexplored paths. On the contrary, if we use a sym- 
bolic executor exploring all execution paths, we may use our technique for basic 
classification of error reports produced by other tools (e.g. XGCC or Stanse). 
For each such an error report, we may instrument the corresponding code only 
with the state machine describing that reported error. If our technique finds the 
same error, it is a real one. If our technique explores all execution paths of the 
sliced code without detecting the error, it is a false positive. If our technique 
runs out of resources, we cannot decide whether the error is a real one or just a 
false positive. 

We have developed an experimental tool implementing our technique. The 
tool instruments a program with a state machine describing locking errors (we 
use a single-purpose instrumentation so far), then it applies an interprocedural 
slicing to the instrumented code, and it passes the sliced code to symbolic execu- 
tor Klee [7]. Our experimental results indicate that the technique can indeed 
classify error reports produced by Stanse applied to the Linux kernel. 

We emphasize the synergy of the three known methods combined in the 
presented technique. 

— The errors are specified by state machines (inspired by metacompilation) 
and a given program instrumented with a code emulating the state ma- 
chines. This provides us simple slicing criteria: we want to preserve values 
of memory places representing states of state machines. Hence, the sliced 
program contains only the code relevant to the considered errors. 

— Slicing may substantially reduce the size of the code, which in turn may 
remarkably improve performance of the symbolic execution. 

— Application of symbolic execution brings us another benefit. While in meta- 
compilation, the state machines are associated to syntactic objects (e.g. lock 
variables appearing in a program), we may associate state machines to actual 
values of these objects. This leads to a higher precision of error detection, 
which may potentially result in a detection of real errors missed by meta- 
compilation. 

The rest of the paper is organized as follows. Sections [2j [3j and |4] deal with 
program instrumentation, slicing, and symbolic execution, respectively. Experi- 
mental implementation of our technique and some experimental results are dis- 
cussed in Section[S] Sectionals devoted to related work while Section[7]indicates 
some directions driving our future research. Finally, the last section summarizes 
presented results. 
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2 Instrumentation 



The purpose of the instrumentation phase of our algorithm is to insert a code 
implementing a state machine into the analysed program. Nevertheless, the se- 
mantics of the program being instrumented must not be changed. A result of this 
phase is therefore a new program that still has the original functionality and it 
simultaneously updates instrumented state machines. We show the process using 
the state machine SM(x) of Figure [1] and the program consisting of a function 
f oo of[3]and the function copy of Figured] The function f oo calls the function 
copy twice, first with the lock LI and then with the lock L2. The locks guard 
writes into buffers buf 1 and buf2 respectively. The function foo is a so-called 
starting function. It is a function where the symbolic execution starts. 



char *buf 1 , *buf 2 ; 
int LI, L2 ; 

void foo (char *src, int n) { 
copy(src, buf 1 , n, &L1); 
copy(src, buf 2 , n, &L2); 

} 

Fig. 3. Function foo forms the analysed program together with function copy. 

The instrumentation starts by recognizing the code fragments in the analysed 
program which manipulate with locks. More precisely, we look for all those code 
fragments matching edge labels of the state machine SM(x) of Figure [1] The 
analysed program contains three such fragments, all of them in function copy 
(see Figured]): the call to lock at line [51 the call to unlock at line W5\ and the 
return statement at line [Pol 

Next we determine a set of all locks that are manipulated by the program. 
From the recognized code fragments, we find out that a pointer variable L in 
copy is the only program variable through which the program manipulates with 
locks. Using a points-to analysis, we obtain obtain the set {LI, L2} of all possible 
locks the program manipulates with. 

We introduce a unique instance of the state machine SM(x) for each lock in 
the set. More precisely, we define two integer variables smLl and smL2 for keep- 
ing current state of state machines SM (LI) and SM(L2), respectively. Further, 
we need to specify a mapping from locks to their state machines. The mapping 
is basically a function (preferably with constant complexity) from addresses of 
program objects (i.e. the locks) to addresses of related state machines. Figure [4] 
shows an implementation of a function smGetMachine that maps addresses of 
locks LI and L2 to addresses of related state machines. We note that the imple- 
mentation of smGetMachine would be more complicated if state machines are 
associated to dynamically allocated objects. 
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const 


int 


smU = 







// 


state U 




2 


const 


int 


smL = 


1 




// 


state L 




3 


const 


int 


smDU = 


2 




// 


state DU 




4 


const 


int 


smDL = 


3 




// 


state DL 




5 
6 


const 


int 


smRL = 


4 




// 


state RL 




7 


const 


int 


smLOCK 




= 0; 


// 


trans i tion 


lock (x) 


8 


const 


int 


smUNLOCK 


= l ; 


// 


transition 


unlock (x) 


9 


const 


int 


smRETURN 


= 2; 


// 


transition 


return 



int smLl = smU , smL2 = smU ; 

int * smGetMachine ( int *p) { 

if (p == &L1) return ftsmLl; 
if (p == &L2) return &smL2; 
return NULL; // unreachable 



void smFire(int *SM, int transition) { 
switch (*SM) { 
case smU : 

switch (transition) { 
case smLOCK : 
*SM = smL ; 
break ; 
case smUNLOCK : 

assert (false) ; // double unlock 
break ; 
default: break; 
> 

break ; 
case smL : 

switch (transition) { 
case smLOCK : 

assert (false) ; // double lock 

break ; 
case smUNLOCK : 

*SM = smU; 

break ; 
case smRETURN : 

assert (false) ; // return in locked 

break ; 
default: break; 
} 

break ; 
default: break; 

} 



Fig. 4. Implementation of the state machine (smFire) and its identification 
(smGetMachine). 
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Figure U contains also many constants and a function smFire implement- 
ing the state machine SM(x). Further, Figure H] declares variables smLl and 
smL2 and initialize them to the initial state of the state machine. Note that we 
represent both states of the machine and names of transitions by integer con- 
stants. Also note that the pointer argument SM of smFire function points to an 
instrumented state machine, whose transition has to be fired. 

It remains to instrument the recognized code fragments in the original pro- 
gram. For each fragment we know its related transition of the state machine and 
we also know what objects the fragment manipulates with (if any). Therefore, we 
first retrieve an address of state machine related to manipulated objects (if any) 
by using the function smGetMachine and then we fire the transition by calling 
the function smFire. The instrumented version of the original program consists 
of the code of Figure 2] and the instrumented version of the original functions 
foo and copy given in Figure [51 where the instrumented lines are highlighted 
by *. 



char *buf 1 , *buf 2 ; 
int LI , L2 ; 

char *copy(char *dst, char *src, int n, int *L) { 
int i , len ; 
len = 0; 

if (sre != NULL && dst != NULL) { 
len = n; 

smFire ( smGetMachine ( L ) , smLOCK); 
lock (L) ; 

} 

i = 0; 

while (i < len) { 
dst [i] = sre [i] ; 
i++ ; 

} 

if (len > 0) { 

smFire (smGetMachine (L) , smUNLOCK ) ; 
unlock (L) ; 

} 

smFire ( smGetMachine (L) , smRETURN ) ; 
return dst ; 

} 

void foo(char *src, int n) { 
copy(src, buf 1 , n, &L1); 
copy (sre, buf 2 , n, &L2); 

} 

Fig. 5. Functions foo and copy instrumented by calls of smFire function. 
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3 Slicing 



Let us have a look at the instrumented program in Figure [51 We can easily 
observe, that the main part of the function copy, i.e. the loop copying the char- 
acters, does not affect states of the instrumented state machines. Symbolic execu- 
tion of such a code is very expensive. Therefore, we use the slicing technique 38J 
to eliminate such a code from the instrumented program. 

The input of the slicing algorithm is a program to be sliced and a so-called 
slicing criteria. A slicing criterion is a pair of a program location and a set 
of program variables. The slicing algorithm removes program statements that 
do not affect any slicing criterion. More precisely, for each input data passed 
to both original and sliced programs, values of the variable set of each slicing 
criterion at the corresponding location are always equal in both programs. Our 
analysis is interested only in states of the instrumented automata, especially 
in locations corresponding to errors. Hence, the slicing criterion is a pair of a 
location preceding an assert statement in smFire function and the set of all 
variables representing current states of the corresponding state machines. The 
slicing criteria then consists of all such pairs. 



1: char *bufl, *buf2; 

2: int LI, L2 ; 

3 : 

4: char *copy(char *dst , char *src, int n, int *L) { 
5 : int len ; 

6: len = 0; 

7: if (sre != NULL && dst != NULL) { 

8 : len = n ; 

9: smFire ( smGetMachine (L) , smLOCK); 

10: } 

11: if (len > 0) { 

12: smFire (smGetMachine (L) , smUNLOCK); 

13: } 

14: smFire ( smGetMachine (L) , smRETURN ) ; 

15 : return dst ; 

16: } 
17 : 

18: void f oo (char *src, int n) { 
19: copy(src, buf 1 , n, &L1); 

20: copy(src, buf 2 , n, &L2); 

21: } 



Fig. 6. Functions f oo and copy after slicing. 

In the instrumented program of Figures|4]and[5j we want to preserve variables 
smLl and smL2. We put slicing criteria into the lines of code detecting transitions 
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of state machines into error states. In other words, the slicing criteria for our run- 
ning example are pairs (27,{smLl,smL2}), (35,{smLl,smL2}), (41,{smLl,smL2}), 
where the number refers to lines in the code of Figure HJ The result of the slic- 
ing procedure is presented in Figures 3] and |5] (the code in the former figure 
shall not changed by the slicing). Note that the sliced code contains neither the 
while-loop nor the lock and unlock commands. 

It is important to note that some slicing techniques, including the one in [38] 
that we use, do not consider inputs for which the original program does not halt. 
Therefore, there is no way to guarantee that a sliced program will fail to halt 
whenever the original program fails to halt. This is the only principal source of 
potential false positives in our technique. 

4 Symbolic Execution 

This is the final phase of our technique. We symbolically execute the sliced 
program from the entry location of the starting function. Symbolic execution 
explores real program paths. Therefore, if it reaches some of the assertions inside 
function smFire, then we have found a bug. 

Our running example nicely illustrates the crucial role of slicing to feasibility 
of symbolic execution. Let us first consider symbolic execution of the original 
program. It starts at the entry location of the function f oo. The execution even- 
tually reaches the function copy. Note that value of the parameter n is symbolic. 
Therefore, symbolic execution will fork into two executions each time we reach 
line 9 of Figure [5J One of the executions skips the loop at lines 9-12, while the 
other enters it. If we assume that the type of n is a 32 bit integer, then the 
symbolic execution of one call of copy explores more then 2 31 real paths. 

By contrast, the sliced program does not contain the loop, which generated 
the huge number of real paths. Therefore, a number of real paths explored by 
the symbolic execution is exactly 6. Figure [7] shows the symbolic execution tree 
of the sliced program of Figure [6l We left out vertices corresponding to lines in 
called functions smGetMachine and smFire. Note that although the parameter 
n has a symbolic value, it can only affect the branching at line 11. Moreover, 
the parameter L always has a concrete value. Therefore, we do not fork symbolic 
execution at branchings inside functions smGetMachine and smFire. Three of 
the explored paths are marked with the label bug. These paths reach the sec- 
ond assertion in function smFire (see Figure 2]) called from line 14 of the sliced 
program. In other words, the paths are witnesses that we can leave the function 
copy in a locked state. The remaining explored paths of Figure [7] miss the asser- 
tions in the function smFire. It means that the original program contains only 
one locking error, namely return in locked state. 

5 Implementation and Experimental Results 

To verify applicability of the presented technique, we have developed an exper- 
imental implementation. Our experimental tool works with programs in C and, 



9 





Fig. 7. Symbolic execution tree of the sliced program of Figure [6] 



for the sake of simplicity, it detects only locking errors described by a state ma- 
chine very similar to SM(x) of Figure [1] The instances of the state machine 
are associated with arguments of lock and unlock function calls. Note that the 
technique currently works only for the cases where a lock is instantiated only 
once during the run of the symbolic executor. It works on a vast majority of the 
code we used. However we plan to add a support even for the rest. The main part 
of our implementation is written in three modules for the Llvm framework |41j , 
namely Prepare, Slicer, and Kleerer. The framework provides us with the 
C compiler CLANG. We also use an existing symbolic executor for Llvm called 
Klee [7]. 

Instrumentation of a given program proceeds in two steps. Using a C prepro- 
cessor, the original program is instrumented with function calls smFire located 



10 



just above statements changing states of state machines. The program is then 
translated by CLANG into Llvm bytecode [JT]. Optimizations are turned off as 
required by Klee. The rest of the instrumentation (e.g. adding global variables 
and changing the code to work with them) is done on the Llvm code using the 
module Prepare. 

The module Slicer implements a variant of the inter-procedural slicing al- 
gorithm by Weiser |38j . To guarantee correctness and to improve performance 
of slicing, the algorithm employs points-to analysis by Andersen [2]. 

The module Kleerer performs a final processing of the sliced bytecode before 
it is passed to Klee. In particular, the module adds to the bytecode a function 
main that calls a starting function. The main function also allocates a symbolic 
memory for each parameter of the starting function. Size of the allocated memory 
is determined by the parameter type. Plus, when the parameter is a pointer, the 
size is multiplied by 4000. For example, 4 bytes are allocated for an integer 
and 16000 bytes for an integer pointer. Further, for the pointer case, we pass 
a pointer to the middle of the allocated memory (functions might dereference 
memory at negative index). The idea behind is explained in [31j . Finally, the 
resulting bytecode is symbolically executed by Klee. If a symbolic execution 
touches a memory out of the allocated area, we get a memory error. To remedy 
this inconvenience, we plan to implement the same on-demand memory handling 
UcKlee [31] does. 

5.1 Experiments 

We have performed our experiments on several functions of the Linux kernel 
2.6.28, where the static analyzer Stanse reported some error. More precisely, 
Stanse reported an error trace starting in these functions. We consulted the 
errors with kernel developers to sort out which are false positives and which are 
real errors. All the selected functions (and all functions transitively called from 
them) contain no assembler (in some cases, it has been replaced by an equivalent 
C code) and no external function calls after slicing. 

We ran our experimental tool on these functions. All tests were performed 
on a machine with an Intel E6850 dual-core processor at 3 GHz and 6 GiB of 
memory, running Linux. We specified Klee parameters to time out after 10 
seconds spent in an SMT solver and after 300 seconds of an overall running 
time. Increasing these times brings no real effect in our environment. We do not 
pass optimize option for Klee because it causes Klee to crash for most of the 
input. 

Table [1] presents results of our tool on selected functions. The table shows 
compilation, instrumentation, slicing, symbolic execution, and the overall run- 
ning time. Further, the table presents the ratio of instructions that were sliced 
away from the instrumented Llvm code. The last two columns specify the re- 
sults of our analysis and the real state confirmed by kernel developers. The table 
clearly shows that the bottleneck of our technique is the symbolic execution. 

Although the results have no statistical significance, it is clear that the tech- 
nique can in principle classify error reports produced by other tools like Stanse. 
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File 


Running Time (s) 


Sliced 


Result 


Factual 


Function 


f < 

1 , i ii 1 1 ] >. 


Instr. 


alic. 


SE 


Total 


(96) 


State 


f s/jf s/super . c 
jf s_quota_write 


1.25 


0.18 


0.15 


5.09 


6.67 


67.8 


BUG 


BUG 


drivers/net/qlge/qlge_main . c 
qlge_set_mac_address 


2.70 


0.72 


26.75 


13.28 


43.45 


66.5 


BUG 


BUG 


drivers/hid/hidraw . c 
hidraw_read 


1.06 


0.18 


0.14 


Timeout 


67.0 


TO 


BUG 


drivers/net/ns83820 . c 
queue_ref ill 


1.76 


0.29 


1.72 


0.62 


4.39 


72.9 


FP 


FP 


drivers/usb/misc/ 

sisusbvga/sisusb_con. c 
sisusbcon_set_palette 


1.50 


0.24 


0.27 


17.19 


19.20 


76.0 


FP 


FP 


f s/jf f s2/nodemgmt . c 

jf f s2_reserve_space 


1.04 


0.18 


0.22 


Timeout 


46.8 


TO 


FP 


kernel/kprobes . c 
pre_handler_kretprobe 


0.32 


0.09 


0.51 


2.43 


3.35 


66.3 


ME 


FP 



Table 1. Experimental results. The table presents running time of preprocessing and 
compilation (Comp.), instrumentation including points-to analysis (Instr.), slicing 
(Slic), symbolic execution (SE), and the total running time. The column Sliced 
presents the ratio of instructions sliced away from the instrumented Llvm code. The 
column Result specifies the result of our tool: BUG means that the tool found a real 
error, FP means that the analysis finished without error found (i.e. the original error 
report is a false positive), TO that the symbolic execution did not finish in time and 
ME denotes an occurrence of memory error. The last column specifies the factual state 
of the error report. 



If our technique reports an error, it is a real one. If it finishes the analysis with- 
out any error detected, the original error report is a false positive. The analysis 
may also not finish in a given time, which is usually caused by loops in the sliced 
code. Finally, it may report a memory error mentioned above. 

6 Related Work 

A description of program properties in Metal language and meta-level com- 
pilation is discussed in [9110113125] . The technique presented in [TU] found a 
thousands of bugs in real system code. It provides an easy description of prop- 
erties to be checked for and a fast analysis. Nevertheless, it suffers from false 
positives. Since false positive rate has huge impact on practical usability, an im- 
portant part of the technique are false positive suppression algorithms like killing 
variables and expressions, synonyms, false path pruning, and others. Besides the 
suppression algorithms, bug-reports from the tool are further ranked according 
to their probability of being real errors. There are generic and statistical rank- 
ing algorithms ordering bug-reports. An extension introduced in |14j provides an 
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automatic inference of some temporal properties based on statistical analysis of 
assumed programmer's beliefs. The ESP [TT] technique uses a similar language 
to Metal for properties description. It implements an interprocedural dataflow 
algorithm based on [3 2) for error detection and an abstract simulation pruning 
algorithm for false positives suppression. Stanse [29] , a static analysis tool also 
uses state machines for description of checked program properties. The descrip- 
tion is based on parametrised abstract syntax trees. Although this tool found 
hundreds of real bugs in the Linux kernel, it suffers from a high false positive 
rate since its false positive suppression algorithms are very limited. 

Program analysis tools based on symbolic execution [27] mainly discover 
low-level bugs like division by zero, illegal memory access, assertion failure etc. 
These tools typically do not have problems with false positives, but they have 
problems with scalability to large programs. There has been developed a lot 
of techniques improving the scalability to programs used in practice. Modern 
techniques are mostly hybrid. They usually combine symbolic execution with 
concrete one [1711 8 19 20 34 36i . There are also hybrid techniques combining 
symbolic execution with a complementary static analysis [3 22 23 24 28:. Sym- 
bolic execution can be accelerated by a compositional approach based on function 
summaries [1115) . Another approach to effective symbolic execution introduced 
in |6I7I8| is based on recording of already seen behaviour and pruning its rep- 
etition. The followng techniques focus on reaching a specific program location. 
Fitnex [39] , a search strategy implemented in Pex [36] , guides a path exploration 
to a particular target location using fitness function. The function measures how 
close an already discovered feasible path is to the target. The LESE 33 approach 
introduces symbolic variables for the number of times each loop was executed. 
The symbolic variables are linked with features of a known grammar generating 
inputs. Using these links, the grammar can control the numbers of loop itera- 
tions performed on a generated input. A technique presented in [21j analyses 
loops on-the-fly, i.e. during simultaneous concrete and symbolic executions of a 
program for a concrete input. The loop analysis infers variables that are modi- 
fied by a constant value in each loop iteration. These variables are used to build 
loop summaries expressed in a form of pre and post conditions. An algorithm 
in [35] constructs a nontrivial necessary condition on input values to drive the 
program execution to a given location. A technique presented in [30] introduces 
a pair of counters for two different paths around loop for each recurrent variable. 
Each counter keeps an information about the number of iterations around one 
path since the last iteration around the other one. Finally, there is an orthogonal 
line of research which tries to improve the symbolic execution for programs with 
some special types of inputs. Some techniques deal with programs manipulating 
strings |5|40j , and some other techniques reduce input space using a given input 
grammar |16|33| . 

The interprocedural static slicing was introduced by Weiser [35]. But nowa- 
days, there are many different approaches to program slicing. They are surveyed 
by several authors [4112137] . Applications of slicing include program debugging, 
reverse engineering and regression testing [25] . 
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7 Future Work 



Our future work has basically three independent directions. 

First, we plan to run our tool to classify all lock- related error reports pro- 
duced by Stanse on the Linux kernel. The results should provide a better image 
of practical applicability of the technique. To get a relevant data, we should solve 
some practical issues like a correct detection of starting functions, automatic re- 
placement of assembler, treatment of external function calls, etc. We should also 
implement an on-demand memory allocation to Klee as discussed in Section [S] 
or use a different executor. 

The second direction is to adopt or design some convenient way for speci- 
fication of arbitrary state machines. It may be a dedicated language similar to 
Metal. Then we plan to implement an instrumentation treating these state ma- 
chines. In particular, the instrumentation should correctly handle state machines 
associated with dynamically allocated objects. 

Finally, we would also like to examine performance of our technique as a 
stand-alone error-detection tool. To this point, we have to use a symbolic ex- 
ecutor aiming for maximal code coverage. In particular, such an executor has to 
suppress execution paths that differ from explored paths only in number of loop 
iterations. Unfortunately, we do not know about any publicly available symbolic 
executor of this kind. However, it seems that UcKlee [31] (which is not public 
as of now) has been designed for a similar purpose. 

8 Conclusion 

We have presented a novel technique combining three standard methods: speci- 
fication of errors with state machines, slicing, and symbolic execution. We cur- 
rently do not know about any technique combining arbitrary two of the three 
methods. We have discussed a synergy of the three methods. Moreover, our ex- 
perimental results indicate that the technique can recognize some false positives 
and some real errors in error reports produced by other error-detection tools. 
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